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Abstract 

We present a flexible stochastic model for a class of cooperative wireless relay networks, in which the relay 
processing functionality is not known at the destination. In addressing this problem we develop efficient 
algorithms to perform relay identification in a wireless relay network. We first construct a statistical 
O . model based on a representation of the system using Gaussian Processes in a non-standard manner due 
to the way we treat the imperfect channel state information. We then formulate the estimation problem 
t> ' to perform system identification, taking into account complexity and computational efficiency. Next we 
develop a set of three algorithms to solve the identification problem each of decreasing complexity, 
, trading-off the estimation bias for computational efficiency. The joint optimisation problem is tackled 
\Q ■ via a Bayesian framework using the Iterated Conditioning on the Modes methodology. We develop a 

o 

lower bound and several sub-optimal computationally efficient solutions to the identification problem, 
for comparison. We illustrate the estimation performance of our methodology for a range of widely 
used relay functionalities. The relative total error attained by our algorithm when compared to the lower 
bound is found to be at worst 9% for low SNR values under all functions considered. The effect of the 
relay functional estimation error is also studied via BER simulations and is shown to be less than 2 dB 
worse than the lower bound. 

Keywords: System identification, Gaussian processes, Kernel methods, Iterated conditioning on the 
modes, Relay networks, Co-operative wireless relay network. 
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I. Introduction to Relay System Identification 

Cooperative communications systems have been proposed to exploit the spatial diversity gains in- 
herent in multiuser wireless systems without the need of multiple antennas at each node, see [1] and [2]. 
This is achieved by having the users relay each others messages and, thus, forming multiple transmission 
paths to the destination. Simply put, such a system broadcasts a signal from a transmitter at the source 
through a wireless channel. The signal is then received by each relay node and a relay strategy is applied 
before the signal is retransmitted to the destination. A number of relay strategies have been studied in 
the literature [2], [3]. The relay function can be optimized for different design objectives [4], [5]. For 
example, in the estimate and forward (EF) scheme, in the case of BPSK signaling, the optimal relay 
function is the hyperbolic tangent [6]. This relay function choice is optimal in this case as it has been 
shown to minimise the Mean Squared Error (MSE) at the relay [5]. Other criteria for which the optimal 
relay function is non-linear include: capacity maximisation [6], minimum error probability at the receiver 
[7]-[8], Signal-to-Noise (SNR) maximisation [5] and rate maximisation [9]. 

In general, the system identification problem may arise in several ways, see [10], [11], [12] and 
references within. For example, in an ad-hoc or sensor networks, it is possible that the destination 
does not have a priori knowledge of the relay functionality utilised by each of the relays in the system 
[13]. Alternative scenarios that this problem may arise include networks which can adapt or cognitively 
learn suitable relay transmission functionality to optimise quality of service constraints such as capacity, 
throughput, bit-error-rate and transmission power [14]. These relay functions can be either static in time 
or in more advanced relay systems they may adapt over time, reacting and updating the transformations 
applied to the received signals before re-transmission to account for time varying channel characteristics. 

Therefore, in all these scenarios, in order for the destination to perform detection of the data symbols, 
it first needs to perform estimation of the relay functionality. This is a challenging problem due to the 
uncertain functional form of each relays processing on the received signal. To address this problem we 
introduce a class of semi-parametric modelling procedures based on Gaussian processes (GP), which are 
specifically designed to solve the identification problem. 

GP define a family of stochastic processes that allow one to undertake flexible semi-parametric 
modelling of causal relationships without a priori specification of the structure of the causal relationship. 
That is we may utilise a GP regression model as a flexible family of regression models, in which the 
relationship between the predictors (independent variables) and the responses (dependent variables) is 
not specified in advance. Instead, this relationship along with the parameters of this semi-parametric 
model are learnt jointly from the observed data. This is ideal in the context of system identification, 
where there is potential for highly non-linear relay function transformations. In this context, additional 
complications arise due to uncertain inputs to the relay function from the stochastic channel and thermal 
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noise. We develop a stochastic model in which we define a distribution over a function space, corre- 
sponding to the class of continuous smooth functions /(•) £ C, which accounts for the uncertainty in 
the relay function. 

As discussed in tutorials by [15], [16] the Gaussian process literature is well established in spatial and 
temporal modelling, since the works of [17]. A few examples which include successful development of 
GP models in engineering include multi-user receiver design [18]-[19]; channel equalisation and decoding 
[20]; speech enhancement [21]; source separation [22]; forecasting of non-linear dynamic systems [23]; 
system identification of dynamical systems [24]; and non-linear identification and equalization and 
separation of signals [25]. 

A. Contributions, Outline and Notation 

The main contributions of this work are as follows: 

• We propose a stochastic model for the problem of relay identification based on a flexible semi- 
parametric GP prior on the functional form of the relay processing functionality. 

• We develop a formulation of this stochastic model under a sequential Bayesian estimation frame- 
work. In doing so we present three estimation scenarios and a comparative lower bound on their 
performance in order of complexity: 

1) the first involves the optimal identification of the unknown relay function after integration of 
the nuisance parameters. We explain how this solution is computationally expensive for on-line 
solutions; 

2) the second involves an efficient, though sub-optimal algorithm. This is based on utilizing 
knowledge of pilot symbols in each transmission frame, zero forcing of the relay noise and 
imperfect channel state information; 

3) the third solution is based on the same assumptions as the second solution but under perfect 
channel state information; 

• Finally, we develop an estimation procedure based on Maximum A Posteriori (MAP) estimation 
jointly for the unknown relay function and unknown model parameters achieved via an adoption 
of the Iterated Conditioning on the Modes (ICM) methodology of [26] to the GP stochastic models 
developed. 

1) Outline: The paper is organised as follows: in Section III we introduce a stochastic system model 
for the wireless relay system and the associated Bayesian model. In Section IV we present the optimal 
identification algorithm. In Section V we present a low complexity sub-optimal algorithms. Section VI 
presents results and analysis and conclusions are provided in Section VII. 

2) Notation: The following notation is used throughout: random variables are denoted by upper case 
letters and their realizations by lower case letters. In addition, bold will be used to denote a vector or 
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matrix quantity, upper subscripts will refer to the relay node index and lower subscripts refer to the 
element of a vector or matrix. 

II. Background on Gaussian Processes 

In this section we present a brief outline on the statistical background for Gaussian Processes. A 
realisation of a random function from a GP can be considered simplistically as the analog of an infinite 
dimensional Gaussian random vector, in which each component of the vector corresponds to a point 
of evaluation of the unknown random function. Therefore, analogously to the multi-variate Gaussian 
random vector, the infinite dimensional GP prior is characterized sufficiently by a mean function and a 
covariance function. The covariance function must be specified carefully to ensure that finite realizations 
of the process correspond appropriately to multi-variate Gaussian random vectors. 

The main idea of utilizing a GP is to work with the unknown relay functions, without parameterizing 
this function. Instead, we place a GP prior directly on the space of functions that we wish to consider 
in the system identification. The probabilistic nature of the GP model allows to define the space of 
admissible functions relating inputs to outputs, by simply specifying the mean and covariance functions 
of the process. For good introductions to GP see [15], [27], [28]. 

Definition 1 [15]: A Gaussian Process is a collection of random variables, any finite number of which 
have a joint Gaussian distribution. 

Furthermore, a GP is completely specified by the equivalent of sufficient statistics for a process, in this 
instance a mean function, denoted m(x; 6) and parameterised by 6, and a covariance function, denoted 
C (x, x'; parameterised by * [15]. 

In the context of system identification, we formulate the problem as a semi-parametric regression 
model. In particular we encode our a-priori belief in the functional form of the relay transform in terms 
of a prior distribution on a function space via a Gaussian process, denoted by the following prior 

n-)~9V(m(-,0),C(; •;*)). (1) 

Formally, this prior model ensures that for any finite set of predictor values or inputs to the unknown 
regression function {r t } t=1 . T , the corresponding random vector for the function at these points given by 
fi-.T = [f > • • • ) / (rr)] is distributed according to the following multivariate Gaussian distribution, 

fi-.T ~ V (/i : t|-Ri:T = = N ' (fi-.T] [m(ri;0), . . .,m(r T ;0)] ,K.v.t) (2) 

with each component [Ki^] ijtt = C (Ri{t) , Rj (t)) = Cov [f(R i (t))f(R j (t))]. Therefore we see that the 
covariance matrix, constructed from the kernel covariance function, measures the similarity between 
pairs of function values. Therefore, as discussed in [29] the elegance of a GP framework is that the 
properties of the unknown function to be estimated are expressed directly in terms of the covariance 
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function, rather than implicitly via basis functions such as in a basis expansion model. To summarise 
this concept, we first consider two scalar inputs denoted by Ri(t) and Rj(t), separated by a distance of 
\\Ri(t) — Rj{t)\\2, and we note that as we draw different realizations from the GP for function realizations 
at these points, f(Ri(t)) and f(Rj(t)), we will get different fluctuations depending on the function drawn. 
The degree to which this fluctuation in the values drawn occurs, is directly affected by the the choice 
of kernel. 

III. Bayesian system model and Relay Identification 

We introduce the system model and a Bayesian model for inference on the system model parameters as 
depicted in Fig. 1. We will generically denote the frame index for the i-th frame using t € {1, . . . , T}. All 
the channels are modelled stochastically, where we do not know a priori the realized channel coefficient 
values. Instead, we consider imperfect Channel State Information (CSI), in which we assume known 
statistics of the distribution of the channel coefficients. 

A. System model and assumptions 



1) Assume a wireless relay network with a single source node, transmitting sequences of K pilot symbols per 
frame. We will denote this set of pilot symbols for frame t as s = s\-k- These symbols are transmitted from 
a source to a single destination via L relay nodes. 

2) There are L relays which cannot receive and transmit on the same time slot and on the same frequency band. 
We thus consider a half duplex system model in which the data transmission is divided into two steps. In 
the first step, the source node broadcasts a code word s from the codebook to all the L relay nodes. In the 
second step, the relay nodes then transmit their signals to the destination node on orthogonal non-interfering 
channels. We assume that all channels are independent with a coherence interval larger than the codeword 
length K. 

3) Assume a general model for the CSI in which the estimates formed from the unknown realised channel 
coefficients for each relay link are known at the receiver i.e. imperfect CSI. This involves an assumption 
regarding the channel coefficients as follows: 

• From source to relay there are L i.i.d. channels parameterized by jiT^ ~ F (h® , af^j j , where F(-) 
is the distribution of the channel coefficients, and the estimated channel, h"'. 

• From relay to destination there are L i.i.d. channels parameterized by {G^ ~ F <r^) where 
F(-) is the distribution of the channel coefficients, and the estimated channel, '(f l \ 

4) The received signal at the l-th relay is a random vector given by 

R (J) =sH (i) +w (z) , I e {!,..., L}, (3) 
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where H^' is the channel coefficient (scalar random variable) between the transmitter and the l-th relay, s 
is the transmitted code-word and is the noise realization (vector random variable) associated with the 
relay. 

5) The transformation (relay function) of the received signal RW, performed by the l-th relay is assumed 
unknown. The unknown system model at each relay node I will be modelled by a distribution over a function 
space as specified by a Gaussian Process prior, 

Here f^(-) is defined to be the random vector function (relay function). A realization of this random vector 
function will be denoted by f^(R^) = f^(R^), . . . , f^(R^) , which is evaluated for the received 
signal at the l-th relay. The distribution of possible functions to be considered is controlled by the GP 
mean function /ii( (-) parameterised by and covariance function k~f m (•,•) constructed from a kernel 
parametrised by Y)^ l \ 

We denote time series observations of the function evaluation at the l-th relay 
( 



f (0 

L 1:T 



/W (H® (1)) ,...,/(') (!)),/« (iZ® (2)) ,...,/« (R 



t=l t=2 

T 



...,/« (^(d) ,...,/« (4 } co 



(4) 



t=T / 

6) To ensure a parsimonious and estimatable statistical model, particularly when L is large, we assume that all 
relay functions will have the same class of mean and covariance functions. 

7) We consider the following model structure for the relay functionality: 

• The choice of mean function considered will be restricted to linear constants and trend models of the form 
/ig ( ,)(i?P) = #P + 9^ ■ This assumption is consistent with the forms of relay function considered 
in the literature such as the Amplify-and-Forward (AT) of [30] or the Estimate-and-Forward (EF) of 
151. 

• The kernel function, KS l > of the l-th relay given in (6), can be expressed as the following squared 
exponential model: 

C(if,iff) = expf- l|Ji ' Jfr" j, (5) 

Vi,j € {1, . . . , iT}. 77ns widely used kernel produces smooth functions with the properties that the co- 
variance function is stationary and non-degenerate [15]. Using this kernel, the corresponding covariance 
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matrix for the l-th relay can be expressed as 
jAi) 



C(Rf\l),R^(T) 



C(RV\T),R%(T) 



(6) 



The corresponding mean vector for the l-th relay can be expressed as 
( 

p% ( R ?(l)),...,,% (*?(!)),/$, (<(2)) (<(2) 



u (0 



t=l 



t=2 



M g, « (T) ),..., ^U^(T) 



(0 MO, 



t=T 



(7) 



• Conditional on the mean functions and covariance functions, /A'/o (■) , C^co (-, •) and /ig^ (•) , C^ij (-, •) 
we consider realisations of each Gaussian Process function to be statistically independent. Therefore we 
are assuming the following model structure: 

E [f(0 (X) fH (Yf \4l (•) ,cg (, .) (•) (-, •)] = = 

/or a/Z X, Y inputs and all relays l,m, where X is a diagonal covariance matrix. This gives us spatial 
independence between the functionality of each relay. 
8) Conditional on matrix f = (fWfijW), . . . ,f( i )(i?( L ))) / the received signal at the destination, from the l-th 
relay, is a random vector given by 

Y (0 = f (0 (r(0) G» + V« / e {1, . . . , L} , (8) 

w/zere i/ie sca/ar random variable is the channel coefficient between the l-th relay and the receiver, 
f(0 (r®) = ( r i^ >••• j/^ ( r i^) zs the memoryless relay processing function {with possibly dif- 
ferent functions at each of the relays) and the random vector is the noise realization associated with 
the receiver. 
We define 



1:T 



1:T 



r(l) 



1:T> • • • ' x 1:T ; ' 



, w/zere 



y/ (i) , . . . , (i), (2) , . . . , (2), . . . , y»> (t) , . . . , (r) 



,(/) 



-(') 



(0 



.(0 



(9) 



t=l t=2 t=T 

9) All received signals are corrupted by zero-mean additive white complex Gaussian noise. At the l-th relay the 
noise corresponding to the l-th transmitted symbol is denoted by random variable wf' ~ C7V (0, afy. At 
the receiver this is denoted by random variable V® ~ Cj\f (O, a%). Additionally, we assume the following 
properties: 



E 



w {l) w (m) 

! 3 



E 



i j 



E 



w {l) v {m) 



0. 
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Vi, j G {1, . . . , K} ,VI,me{l,...,L},!/j,l/m, where Wj denotes the complex conjugate of Wj. 
We define 

Wi . T = ( w£L W { 3) , where 



1:T = \ W 1:T , . . . , VV 1:T j , 

W« = I wf > (1) , . . . , w£ (l)M l) (2) , • • • , W® (2), . . . , wf > (T) , . . . , (T) 



(10) 



t=l t=2 t=T 

We now present the posterior parameters required to be estimated, followed by the prior distributional 
choices, finishing with the posterior distribution for the directed acyclic graph in Fig. 2. Given the 
posterior distribution we formulate the relay system identification problem in Section IV. The posterior 
parameters and functions of interest are given by the parameter vector after observing T frames, 

(f (•) , 0, d) = ff « (•),..., f wu, 0( 1:L ), . (ii) 



The prior choices for the relay functionality for f are given by a GP with hyper priors for the mean and 
covariance functions as specified below. 
Prior Model Structure 

1) The priors of the hyper-parameters associated with the linear mean function are given by 9 = [6^, . . . , 0( L >], 



with 0W 



, where 0^ ~ N(0, 1), ~ N(0, 100) for all I. Note, here we assume a vague prior 
for the gradient of the mean function. 

2) The priors of the hyper-parameters, associated with the kernel function C in (5), considered in the construction 
of the covariance function is specified by D = [D^\ . . . , D^], where ~ U [0, 10]. 

3) The hierarchical prior for the l-th relay function is then given by f^(-) ~ QV (uqq) (•) ,C^ m (-, -)\ with 
GP mean function /A( parameterised by 0® and covariance function C^ (0 (-, •) constructed from a kernel 
parametrised by D' 1 '. 

Posterior Model Structure 

The combination of the likelihood model, priors for model parameters and symbols, and the hierarchical 
priors for the GP prior, when combined under Bayes' Theorem, results in a full posterior distribution: 

P (0,DX;T,f 1:i (-)|yi;£) 

- Up (y l &.\f&,o®D® t w« ) QV (f(0 (.) ;/ $ (•) ,c% (•, •)) p (w» ) P (*«>) P 

= n (n* (y (/) wi f(/) ( rW (*)) ,fl ro ^ (o ,w»(t)) ^ ( f « (•) ,cg (,o) 

i=i \t=i V V J J V 7 (12) 

xp(w«(t)))p(0«)p(l>«) 



Z=l V=l fc=l 



xp(w«(t)))p(0«)p(^) 



Note, we have included auxiliary parameters w{:£ to represent the unknown noise realizations at the L 
relays for each transmitted sequence of symbols. The augmentation of these auxiliary parameters in the 
posterior specification allows us to obtain closed form expressions for the likelihood model, in particular 
a Gaussian form which will be relevant when combined with the GP prior for the relay functionality. 
Without the introduction of these auxiliary nuisance parameters we would be unable to derive a closed 
form expression for the relay function likelihoods, see discussions in [31]. 

IV. Problem Formulation for System Identification and Inference 

Now we can specify the marginal posterior distributions of particular interest to the identification 
problem using the Bayesian model presented in (12). We note that typically in the standard GP regression 
framework, the values of the covariate, in our model given by r£ (t), are known or observed inputs, one 
corresponding to each of the observations of the symbols at each of the relays. However, in our model, 
these are random and unobserved variables due to the additive noise at the relay. In the following 
section we specify three different solutions to the estimation problem. 

A. Problem Formulation I 

The most general framework is based on the posterior estimation of the GP mean and covariance 
functions, parameterized by the parameters 0,D. 
System Identification Definition 1: 

{f( 1:L )(.),0 (1:L) ,5 (1;L) } = 

L / T K \ 

argmax / JT ] \ dw®(t)p (y^(t)\f (t) W°(t)) , 0®,D®, w® (t)) p (w<°(t)) (13) 

f(l:t)(.) )0 (l:I,) i £ ) (l:I,) J ^ ^ V V J J 

* (f® (•);/$) (-),cS (v))p(^)p(^). 

This solution is optimal from the perspective that it minimises the variance of the estimator due to the 
Rao-Blackwellisation performed to integrate the nuisance parameters corresponding to the relay noise. 
This problem has a high computational complexity and would typically be solved via a numerical 
procedure, such as the Markov Chain Monte Carlo (MCMC). Unfortunately, MCMC solutions would 
not easily adapt in this scenario to a computationally efficient solution, see [31]. 

We focus on an alternative class of solutions that has a much lower computational complexity, 
which admit recursive estimation algorithms. Instead of performing numerical integration followed by 
numerical optimization to perform relay identification, we solve the following problem via an iterative 
optimisation procedure. 
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B. Problem Formulation II 

Formulating computationally efficient estimation procedure, we shall make the following assumptions. 
Assumption I: Consider imperfect CSI. We condition inference on a noisy estimate of the sufficient statistics of 
the channels, given by E[G^] = g®, E[HW] = U l \ 

Assumption II: We consider zero forcing (ZF) condition for the relay thermal noise given by W' ! ' = E[W' ! '] = 0. 
As a result of these assumptions, the received signal at the relay is given by 

rg|fs,H«=^0,w»=0) = sh®, l€{l,..- ,L}, 

(14) 

Y«| (s, H« = ^0, G W = 3®, w« = o) = fO (sfr«) + v«, Z e {1, • • • , L} , 
where fOO (•) ~ GP (/$ (O.cg,, (•,•)). 

System Identification Definition 2: Under assumptions I and II, the resulting identification problem can 
be stated as the following 

|f(i^)(.) 0(1^)^(1^1 = 

L / T K \ (15) 

argmax [J II II P (*)I/ W (^C*))) ^ (f« (•) ; A (OA (■, ■)> P . 



C. Problem Formulation III 

As a quantification of the best case identification accuracy we present the following lower bound 
estimation, based on the following assumptions. 

Assumption III: Consider perfect CSI where all the channels, denoted by G® and H®, are known exactly at 
the receiver. 

Through Assumption II and Assumption III, we obtain a lower bound on the accuracy and performance 
that one can achieve. This is because they result in knowledge of the following form, 

rg| ( s, H® = h® , W® = 0)= sh® ,le{l,...,L}, 

(16) 

YW I (s, H.® = h® , G® = g® , W(" = 0) = f ® (s/iW) ff « + V« , Z € {1, . . . , L} . 
This allows us to make the following third system identification definition for the lower bound on 
performance. 

System Identification Definition 3: Under assumptions II and III, the resulting lower bound on the 
identification problem can be stated as the following 

{SP(.),sgf ) ,5£f ) } = 

argmax f[ (f[ f[ P (y®(t)\f® (r®(t)))) GV (f® (■);/$, (■) ,C® {1) (, •)) P P (d®) . 

(17) 
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We note that the estimation problem involves estimation of the Maximum a Posteriori (MAP) relay 
function values locally at each of these predictor locations given by the MAP estimator f^'(-). One may 
then utilise the results of this estimation for prediction of the unknown mean structure of the function 
/^(•), conditional on estimated hyper parameters for the mean and covariance function, at new input 
values of the received signal, i\, via the identity 



W\ mo _i_~2fr 1 „(o 



and the associated estimation error variance 



(0 A -jo 



f(Hr*)-T* {l) ) 2 \yZ\eU,n«\B«\r, 



(19) 



V. System Identification via Iterated Conditional Modes 

Having formulated the relay identification problem, we now address algorithmic procedures to solve 
efficiently (13). 

A. Understanding Iterated Conditioning on the Modes Estimation for System Identification 

ICM was originally developed in [32] and [26] for efficient MAP estimation in very high dimensional 
Bayesian models such as Markov random fields which were used in the analysis of images with speckle 
noise. Since these papers, ICM procedures have been developed in many other areas of estimation, see 
examples in [33] and [34]. 

As discussed in [35] ICM is fundamentally a deterministic optimization method that finds the joint 
posterior modal estimators corresponding to the MAP estimates. We illustrate this simple algorithm on 
a generic two parameter example developed in [35] before extending the concept to the solution to the 
relay identification problem. In this example a generic bivariate posterior distribution p(0i, O^W.t) is 
considered and the aim is to find the MAP estimate corresponding to the mode, which satisfies 

d d 

W PiOiMyi:T)\e 2 =% = w pI?xMvw)\ ^ = o- (20) 



This is equivalent to 

d d 
p(02\yi:T)-g^p{di\e 2 ,yi:T)\ e2= Q 2 = p(9i \yi:T)-^-p{02 \0 lt yi :T ) \ 6l= ^ = 0, (21) 

assuming that p{0\\yi-T) / and p(02\yi:T) 7^ 0. Hence, given the full conditional posterior distributions 
p(0i\02,yi-.T) and p{02\0i,yi-.T) and solutions for their modes 6\ = 0i(02,Z/1:t) and 62 = 02(0i,yi-.T), the 
algorithm then proceeds by initializing the mode estimates and successively iterating over updates 
of conditional mode estimates. For example at iteration j, the update of 0{ conditions on the mode 
estimate of 6 2 and then the update of 0\ conditions on the mode estimate 0\ . This is repeated 
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either for a fixed number of iterations or until a convergence criterion is satisfied. When the posterior 
full conditionals are unimodal, this procedure is guaranteed to converge to the global maximum. In 
other multi-modal settings, the ICM algorithm is guaranteed to converge to a maximum, though it may 
be a local maximum [26]. 

Hence, all variants of the ICM estimation procedure have the following in common: they first involve 
specification of a multivariate posterior distribution, deconstructed as a set of full conditional posterior 
distributions. These full conditional posterior distributions are given, for the l-th relay, based on the full 
posterior in (12) 

P (f (•) 0,n,yj;-'->) cx f[ J!* (y«(t)|/» (r«(t)) , 0« , D® , w <0(t)) QV (f« (•) ; M & (•) ,C% (; ■)) ! (22a) 

t=i fe=i 

T if 

p (o\r>,i l (•) , y « ) oc J] II ^ ( f(0 (•) ;A (•) ,cg •)) p (e (0 ) ; (22b) 

t=i fc=i 

P ( D |fl, f (•) , y% ) oc fl ft ST (f (0 (•) ; C> (') . C So (•> •)) P • (22c) 

t=l fc=l 

Next, given these full conditional posterior distributions, and observations for the l-th relay, applying 
the ICM algorithm involves initializing the estimate of the MAP solutions, denoted by ff['.y \ 6^°\ D(°^ . 
Then the j-th iteration of the ICM algorithm successively updates each estimate of ff''.y \ 0® , J based 
on the solutions at iteration (j — 1), and the solutions to the following sequence of MAP estimates for 
the full conditional posteriors, 

VecffJ;^] = argmaxp (V (•) ^""^.D^.ySlU ; (23a) 
= argmax:p (^ifj^" 1 ), fj'.^, ygU ; (23b) 
D (i) = argmaxp (d|0 (j) , fj;^, y[ r ) . (23c) 

Repeating this procedure can guarantee convergence to a maximum, see [26], who noted that ICM 
can be made equivalent to instantaneous freezing in a stochastic optimization procedure known as 
simulated annealing [36]. Iterating this procedure successively produces a sequence of MAP estimates 
which converge to an optimum solution, typically for a small number of iterations J. Additionally, the 
number of ICM iterations J required to obtain the optimal MAP solutions for each set of T frames of 
K symbols in our problem formulation is designed to be independent of T and K. This is due to the 
conjugacy we exploit for the model MAP estimation of the vector component which grows linearly in 
dimension with KT, corresponding to the vector Vec[f{. T ]. 

B. Generic ICM Estimation for Wireless Relay System Identification via GP's 

To develop an ICM algorithm, we need to construct a block Gibbs framework for posterior inference, 
and obtain expressions for the mode of the full conditional posterior distributions. This is equivalent 
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to finding the conditional MAP estimate. In this problem we will exploit conjugacy properties of the 
posterior model in (13). 

Theorem 2: The full conditional distributions under (14) are given by: 

1) The full conditional posterior distribution for the l-th relay function in (22a ) is given by 



p ( Vec 
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The conditional MAP estimate of the relay function in (23a), denoted by Vec if/ T , is given by 
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2) The fuil conditional for 0® in (22b) can be expressed as 
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in (23b), denoted by 6^, is given by 
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3) The full conditional for in (22c) can be expressed as 



F (0 

L 1:T 



(32) 



The conditional MAP estimate of in (23c), denoted by D®, is given as the numerical solution to 

_i <1ICi;t\ 1 
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where I [■] is the indicator function, and we have 

dKi-T 1 



dD(0 £>(*) 



and is the Kronecker product, and the i, j-th element of $i-.t is given by [$i : r 
The proof of Theorem 1 is provided in Appendix 1. 
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C. Recursive 1CM-GP System Identification Algorithm 

We develop a version of Iterative Conditioning on the Mode (ICM) [37] which exploits different 
structural properties of the GP model. 

We begin with the most computational approach, though it represents the optimal solution as it 
utilises all information obtained to update the estimated function. 
Online Estimation I - Full Information 

This approach is optimal since, for each relay I, having observed t-frames of i'T-symbols of observed 
data y$ = (y} 1) (1), . . . ,Y$(1), . . . , Y} l \t), Y®(t)\ we evaluate based on 

c(Rf\l),RP(l)) ■■■ c(l$\l),R®(t)" 



C[R®{t),B®{t) 



(35) 



C[R^(t),R?(l)^ 

This approach utilises all the observed information in the estimation of the ICM algorithm. However 
for J iterations of the ICM algorithm it will be of complexity 0(JK 2 t 2 ) in memory usage and 0(JK 3 t 3 ) 
in computational complexity. This is primarily due to the cost of inverting the Gram matrix in each 
update stage of the ICM algorithm. However, in this approach each estimation is performed based on a 
new frame of length K of observed symbols and the estimation of the matrix of function values, f[/ t , is 
completely updated based on all past history. For example, the evaluation of (25) for updating the ICM 
MAP estimate of the function values is given by, 
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^1?T ) ~i 
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M + -^Vec 
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This expression involves the evaluation of k!(\ and its inverse, which become rapidly computationally 
expensive though analytically exact for the update of the MAP estimate of the function values f[ l ) t at 
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each iteration of the ICM algorithm. We circumvent this growing computational complexity with two 
alternative approaches presented next and compare their performance to this optimal solution. 
Online Estimation II - Frame-by-Frame 

This approach is sub-optimal though significantly more computationally efficient. In this proposed 
approach, for each relay I, having observed the t-th frame of K-symbols = \ Y^\t), . . . , Yjp(t)\ we 
evaluate K,f) t based on just the current frame of data, 

c(R { l\t),R«\t) 
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Effectively it is equivalent to partitioning K\. t as follows, 
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However, for J iterations of the ICM algorithm at frame t it will be of complexity 0{JK 2 ) in memory 
usage and 0(JK 3 ) in computational complexity. In addition, since the relay functions being estimated 
are not time varying, the estimates obtained on previous frames (f}f^_ 1 ) and their uncertainty can 
be combined in several different approaches for example via an update according to the following 
recursions, 



m t = m t -i + - (ff ) 



:(0 



mt-i 



m t 



m t 



t-i 



(38) 



Here, the matrix of function estimates, denoted by , corresponds to function values obtained for 
frame t from the ICM estimates if \ quantised to a grid of predictor values R±,...,Rs which partition 
the convex hull of the relay symbols based on the constellation being transmitted. These quantised 
values are then included in an average of the function values for each frame, where the mean function 
vector at these quantised grid points is denoted by m t and its uncertainty measured by a covariance in 
the estimate at these quantised values is denoted by <&t. 
Online Estimation III - Sliding Window 

This approach provides a trade-off in computational complexity and optimality between the proposed 
estimation approaches I and II. It is sub-optimal, though significantly more computationally efficient than 
approach I and it recursively uses previous frames as opposed to the block wise analysis in approach 
II. 
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Under this proposed approach, for each relay I, having observed r symbols of observed data ly 



(0 



y/^(l), . . . , Yg } (l), . . . , Y^'{t) \, we evaluate fC[! t according to a sliding window based on the past 
block of 5 observed symbols. For example we may consider blocks of length K (i.e. length of the frame) 
and an update of the function estimates for each newly observed symbols observation. This corresponds 
to utilizing a Gram-matrix for each update with structure given by, 



-(I) 



(0 

T-l 



(39) 



kV^RV^f C(R®(t),R®(t)) 
where we have dropped the frame index on the received signals which is redundant in this specification. 
We define the last row and column of the new matrix as the vector 

fc®! (R®(t)) = [C (R( 1 \t -5 + 1), R®(t)) ,...,C (R®(t - 1),R®(t))] . In addition the modified matrix 
iCj)-\ corresponds to the "down sized" version of the regularised matrix lOp-i obtained by removing 
the first row and column, ensuring the sliding window structure and the fixed dimensionality of /C T 
for all times r. 

This particular approach is computationally efficient as it allows one to utilise a special computational 

evaluation of the inverse matrix ]Cp in (25) when updating the ICM MAP estimates. This involves 

r m 1 _1 

utilising recursive knowledge of the previous updates matrix and its inverse JC).^ under an approach 
described in [25]. This approach utilises the Sherman - Morrison - Woodbury matrix inversion Lemma 
[38] under two applications detailed below. 

1) Evaluate the inverse of the down sized matrix iC^-i- 

2) Evaluate the inverse of the up sized matrix fZ? based on the inverse of K%-v 

Dropping the relay index I for convenience, the inverse of the down sized matrix is obtained using 
the previous evaluation of the matrix, decomposed according to 

C {R(t -S + 1),R(t-S + 1)) jfer-i (R(t - 1)) 

fc T _l (R(T - 1)) £ r _x 



/C r _i = 

and its inverse decomposed as 
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where [^r-i] 22-55 denotes the lower (2 x S)(2 x 5) sub-block of the inverse matrix Under these 

decompositions, the inverse for the down sized matrix is obtained using the following matrix inversion 
lemma identity [39] 



T-l 



1 



[^r-l] 12:15 [^r-l 



T 



122:55 [/C -1 ]] L T_1J 12:15 L'^T-IJ 12:15 ' 

Having obtained the inverse of the down sized matrix in terms of values known at iteration (r — 1), the 
inverse of the upsized matrix proceeds according to the following decompositions again via application 
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(42) 



of the matrix inversion lemma. Again, dropping the relay index I for convenience, the inverse of the up 
sized matrix is obtained using the previous evaluation of the matrix, decomposed according to 

/C T _i k T (R(t)) 

^ k T (R(r)f C(R(t),R(t)) 

and its inverse decomposed as 

i^r ] 11:(5-1)(5-1) \J^r ] 12:15 
[^t 1 ] 12:15 [^ 1 ]s5 

where [/C^ 1 ] u.(s-i)(S-i) denotes the upper (2 x5)(2x S) sub-block of the inverse matrix K,~ x . Applying 
the matrix inversion lemma, one can obtain further decomposition according to the previously evaluated 
down sized matrix inverse as follows, 
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with [JC- 1 ] ss = [C (R(t), R{t)) - k T (R(t)Y K-\k T (R(t))^ 

In addition, since the relay functions being estimated are not time varying, as detailed in Approach 
II, the estimates obtained on previous frames (fj''.).^) and their uncertainty can be combined in several 
different approaches for example via the update mechanism described in (38). 

VI. Simulation Results 

In this section, we present the performance of the proposed algorithms via Monte Carlo simulations. 
The simulation settings for all the simulations are as follows: 

• The prior distribution for all the channels is Rayleigh fading, and the channels are assumed to be 
both spatially and temporally independent; 

• The channels uncertainty was set to <7q = <7p = 0.2; 

• We define the received SNR as the ratio of the average received signal power to the average noise 
power, 



SNR = 10 log 
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. The SNR is set to dB (low SNR) and 10 dB (high SNR); 

• The results are obtained from simulations over T = 100 transmitted frames with K = 200 symbols 
per frame; 

• In all simulations 16PAM constellations were considered; 

• The ICM algorithm iterated J = 50 times over the solutions to (23a-23c); 

• The relay functions tested correspond to: absolute value f(x) = \x\, linear (affine transformation) f(x) = 
ax + b, hyperbolic-tan f(x) = atanh(ti;x + <p) and demodulated; 
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• Simulations were performed for full kernel matrix estimation Approach I under both perfect CSI and 
imperfect CSI; partial kernel matrix estimation Approach II with frame by frame estimation under 
both perfect CSI and imperfect CSI; and a sliding window with 50% overlap estimation Approach 
III under both perfect CSI and imperfect CSI. 

We begin with an analysis for the case in which the algorithm is applied to an increasing number 
of frames of transmitted symbols, according to the Full Information (Approach I) in which the kernel 
matrix is not truncated in any manner. These are the results with the highest computational complexity 
but without any reduction of the kernel matrix, hence these results were produced form the full set of 
observed information over time, as presented diagrammatically in Subplot (a) in Fig. 3. 

Analysis of the performance of the ICM algorithm can be studied in several ways. We first present 
in Fig. 4 representative results for the estimation of the MAP model hyper-parameters #2> D), jointly 
estimated with the relay function identification under ICM. We present these estimates versus the ICM 
iterations j = 1, . . . , 50. Two important features are evident, the first that the results converge to a set of 
optimal values and secondly that this occurs relatively rapidly, with very few iterations of ICM required. 
This is characteristic of all the examples we tested. 

Next we summarize the findings for the perfect CSI and imperfect CSI studies of each of the different 
relay functions: absolute function in Fig. 5, linear function in Fig. 6, tanh function in Fig. 7 and demod- 
ulated function in Fig. 8. These results are presented each in four sub-plots, the first two are obtained 
from the Full Information (Approach I) and the last two are for the Frame-by-Frame (Approach II). The 
results we present are the posterior MAP via the GP estimation in (18). In addition, in grey we present 
the 95% posterior confidence intervals on these MAP estimates via (19). We compare these estimates to 
the true relay functional form utilised in the simulations. 

• For all the non-linear relay functions considered, the estimation under both perfect and imperfect 
CSI is highly accurate for the Full Information (Approach I). 

• The demodulated relay function which had the linear trend with stairs overlayed, was most difficult 
to perform system identification as it contained a global feature of the linear trend as well as local 
fine scale features corresponding to the stairs function. We observed that in the Full Information case 
with perfect CSI, the estimation was relatively accurate. However, for the imperfect CSI settings, 
the estimation of the global feature of the trend was evident, though the resolution of the local 
features of the stairs was diminished. Therefore, learning such intricate features will require many 
more frames of estimation. The results presented were for 100 frames. With an increase over time 
to 500 or more, the estimation will resolve these local features. 

• As expected in any regression based anlaysis, the functional forms were most difficult to estimate 
at the extremities of the convex hull of the received PAM constellation points. This can be shown 
to result in the largest predictive uncertainty in the estimated function, leading in this case to most 
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uncertainty in the estimated relay functional forms. This was observed in all settings and for all 
functions, and is most poignant in the Demodulated function example. 
• In the Frame-by-Frame results we evaluate the function at fixed grid points set by the constellation 
transmission symbols space. In this case we observe a smoothing of the estimates which can help 
resolve the local resolution when compared to the estimation of the relay function at each observed 
constellation point as undertaken in Approach I. This smoothing approach could also be applied to 
Approach 1. 

Next, we present summarised results in Table I, which are each based on the absolute error in the 
MAP estimated relay functions under Approaches I, II and III over 100 frames. We observe that, as 
expected, there is a clear trade-off between computationally efficiency and accuracy in the estimation. 
The Full-Information (Approach I) is the most accurate. The Overlapping-Sliding-Window, which in this 
simulation had a 0% overlap, was the least accurate, though it was the most computationally efficient 
approach corresponding to Frame-by-Frame estimation. 

Finally we present Bit Error Rate (BER) results for the case of linear relay function. The results 
are presented in Fig. 9 for both Full-Information (Approach I) and Frame-by-Frame (Approach II). For 
comparison, we also plot the BER for the case of perfect CSI and perfect knowledge of the relay function 
which serves as a lower bound on the BER performance. As the figure shows, there is less than ldB 
difference between the lower bound and Approach I under perfect CSI; and less than 3 dB difference 
between the lower bound and Approach II under perfect CSI. In the imperfect CSI case there is a further 
2 dB degradation in both cases. These results demonstrate the importance of accurate estimation of the 
relay function. 

VII. Conclusions 

We considered the problem of relay identification in wireless relay systems. We developed a flexible 
stochastic model for a class of cooperative wireless relay networks, in which the relay processing 
functionality is not known at the destination. Working under this modelling framework we developed 
and demonstrated the performance of our estimation procedure aimed at performing efficient system 
identification. 

In particular we demonstrated that Gaussian Process modelling via ICM approach can resolve the 
problem of system identification in a computationally efficient algorithm for many different relay func- 
tional forms which have desirable characteristics with respect to transmission functionality and quality 
of service. 
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VIII. Appendix 

Here we provide a proof for the expressions in Theorem 1. 
Proof: The posterior distribution decomposes according to (12), and we can therefore derive the 
following quantities for a given relay 

1) The full conditional posterior distribution for the l-th relay function in (22a) is given by 

With a matrix variate normal likelihood model for yf.^lfi-r an d a GP prior on the function will 
result in a matrix variate prior for the function over the symbols in each frame, i.e K x T. After 
vectorizing the observation random matrix and the prior random matrix, we obtain multi variate 
Gaussian distributions which admit standard conjugacy properties, see [40]. This results in 

p(vec[fS] \0W,DW,y&) = iV(M,E), (46) 

where M and S are defined in (25-27). 
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2) The full conditional for 0® in (22b) can be expressed as 
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Deriving the MAP estimate is achieved by applying the log transform to the posterior, taking 
partial derivative with respect to each parameter, setting to and solving as follows: 
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It produces the following linear system of equations with a unique solution: 
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Here, 
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3) The full conditional for in (22c) can be expressed as 

P fD«|fS,^, y f) T )ocivfVec[f«l ;A,4]p(d (!) 
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where we clarify that /c[^ T is implicitly a function of = D^ l \ and the kernel choice given in 
Section III-A item 8 ensures that this parameter is a scalar random variable. 

Deriving the MAP estimate is achieved by applying the log transform to the posterior, taking 
derivative with respect to DW, setting to and solving as follows: 
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It ( 'r- 1 rf ^ 1:T 
'2 V UT dD(D 



-(Vec[fi : T] -fJ.i-.Tj (/Ci : t) -jtvtT (^i:t) ( Vec [fl :T ] - tH:T 



£ w G (a, 6) 
I 



£>W g (a, 6 
(54) 



This expression is obtained by using standard matrix derivative identities. 



23 




1: System model for transmission from one source, through L relay channels to the destination. 
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Fig. 2: Directed Acyclic Graphical Model structure for the hierarchical Bayesian model. 
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(a) Frame-by-Frame 



(b) Sliding window 



Fig. 3: Construction of Kernel matrix under Frame-by-Frame (Approach II) and Sliding Window (Approach III). 



Fig. 4: Convergence of hyperparameters estimates 0[ , 9^ and D 





Full Information 


Sliding Window 


Frame-by-Frame 




perfect CSI 


imperfect CSI 


perfect CSI 


imperfect CSI 


perfect CSI 


imperfect CSI 


Relay 
function 


high 
SNR 


low 
SNR 


high 
SNR 


low 
SNR 


high 
SNR 


low 
SNR 


high 
SNR 


low 
SNR 


high 
SNR 


low 
SNR 


high 
SNR 


low 
SNR 


ABS 


0.91 


3.15 


0.97 


3.37 


1.26 


3.65 


1.43 


3.83 


1.29 


3.69 


1.52 


3.93 


LINEAR 


0.57 


0.76 


0.76 


0.83 


1.07 


1.24 


1.11 


1.31 


1.15 


1.24 


1.26 


1.43 


TANH 


0.18 


0.21 


0.19 


0.22 


0.2 


0.23 


0.22 


0.24 


0.28 


0.31 


0.33 


0.42 


DEM 


0.83 


0.88 


0.85 


0.91 


1.11 


1.32 


1.21 


1.42 


1.23 


1.45 


1.51 


1.74 



TABLE I: Comparison of absolute error in MAP estimation for relay functions under each estimation 
approach, for perfect and imperfect CSI. 
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(a) Approach I: perfect CSI (b) Approach I: imperfect CSI (c) Approach II: perfect CSI (d) Approach II: imperfect CSI 

Fig. 5: ABS relay function 







(a) Approach I: perfect CSI (b) Approach I: imperfect CSI (c) Approach II: perfect CSI (d) Approach II: imperfect CSI 



Fig. 6: LINEAR relay function 
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(a) Approach I: perfect CSI (b) Approach I: imperfect CSI (c) Approach II: perfect CSI (d) Approach II: imperfect CSI 



Fig. 7: TANH relay function 
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Lower bound 

e - Perfect CSI, Approach I 
■ a - Imperfect CSI, Approach I 
-o— Perfect CSI, Approach II 
-h— Imperfect CSI, Approach II 



10 15 

SNR [dB] 



Fig. 9: BER for Amplify-and-Forward relay estimation under perfect /imperfect CSI 
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